Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 181
Filtrar
1.
Nature ; 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38720077

RESUMO

Emerging spatial computing systems seamlessly superimpose digital information on the physical environment observed by a user, enabling transformative experiences across various domains, such as entertainment, education, communication and training1-3. However, the widespread adoption of augmented-reality (AR) displays has been limited due to the bulky projection optics of their light engines and their inability to accurately portray three-dimensional (3D) depth cues for virtual content, among other factors4,5. Here we introduce a holographic AR system that overcomes these challenges using a unique combination of inverse-designed full-colour metasurface gratings, a compact dispersion-compensating waveguide geometry and artificial-intelligence-driven holography algorithms. These elements are co-designed to eliminate the need for bulky collimation optics between the spatial light modulator and the waveguide and to present vibrant, full-colour, 3D AR content in a compact device form factor. To deliver unprecedented visual quality with our prototype, we develop an innovative image formation model that combines a physically accurate waveguide model with learned components that are automatically calibrated using camera feedback. Our unique co-design of a nanophotonic metasurface waveguide and artificial-intelligence-driven holographic algorithms represents a significant advancement in creating visually compelling 3D AR experiences in a compact wearable device.

2.
Radiology ; 311(2): e233270, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38713028

RESUMO

Background Generating radiologic findings from chest radiographs is pivotal in medical image analysis. The emergence of OpenAI's generative pretrained transformer, GPT-4 with vision (GPT-4V), has opened new perspectives on the potential for automated image-text pair generation. However, the application of GPT-4V to real-world chest radiography is yet to be thoroughly examined. Purpose To investigate the capability of GPT-4V to generate radiologic findings from real-world chest radiographs. Materials and Methods In this retrospective study, 100 chest radiographs with free-text radiology reports were annotated by a cohort of radiologists, two attending physicians and three residents, to establish a reference standard. Of 100 chest radiographs, 50 were randomly selected from the National Institutes of Health (NIH) chest radiographic data set, and 50 were randomly selected from the Medical Imaging and Data Resource Center (MIDRC). The performance of GPT-4V at detecting imaging findings from each chest radiograph was assessed in the zero-shot setting (where it operates without prior examples) and few-shot setting (where it operates with two examples). Its outcomes were compared with the reference standard with regards to clinical conditions and their corresponding codes in the International Statistical Classification of Diseases, Tenth Revision (ICD-10), including the anatomic location (hereafter, laterality). Results In the zero-shot setting, in the task of detecting ICD-10 codes alone, GPT-4V attained an average positive predictive value (PPV) of 12.3%, average true-positive rate (TPR) of 5.8%, and average F1 score of 7.3% on the NIH data set, and an average PPV of 25.0%, average TPR of 16.8%, and average F1 score of 18.2% on the MIDRC data set. When both the ICD-10 codes and their corresponding laterality were considered, GPT-4V produced an average PPV of 7.8%, average TPR of 3.5%, and average F1 score of 4.5% on the NIH data set, and an average PPV of 10.9%, average TPR of 4.9%, and average F1 score of 6.4% on the MIDRC data set. With few-shot learning, GPT-4V showed improved performance on both data sets. When contrasting zero-shot and few-shot learning, there were improved average TPRs and F1 scores in the few-shot setting, but there was not a substantial increase in the average PPV. Conclusion Although GPT-4V has shown promise in understanding natural images, it had limited effectiveness in interpreting real-world chest radiographs. © RSNA, 2024 Supplemental material is available for this article.


Assuntos
Radiografia Torácica , Humanos , Radiografia Torácica/métodos , Estudos Retrospectivos , Feminino , Masculino , Pessoa de Meia-Idade , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Idoso , Adulto
3.
J Biomed Inform ; : 104646, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38677633

RESUMO

OBJECTIVES: Artificial intelligence (AI) systems have the potential to revolutionize clinical practices, including improving diagnostic accuracy and surgical decision-making, while also reducing costs and manpower. However, it is important to recognize that these systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender. Such biases can occur before, during, or after the development of AI models, making it critical to understand and address potential biases to enable the accurate and reliable application of AI models in clinical settings. To mitigate bias concerns during model development, we surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV). Then we discussed the methods, such as data perturbation and adversarial learning, that have been applied in the biomedical domain to address bias. METHODS: We performed our literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords. We then filtered the result of 10,041 articles automatically with loose constraints, and manually inspected the abstracts of the remaining 890 articles to identify the 55 articles included in this review. Additional articles in the references are also included in this review. We discuss each method and compare its strengths and weaknesses. Finally, we review other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness. RESULTS: The bias of AIs in biomedicine can originate from multiple sources such as insufficient data, sampling bias and the use of health-irrelevant features or race-adjusted algorithms. Existing debiasing methods that focus on algorithms can be categorized into distributional or algorithmic. Distributional methods include data augmentation, data perturbation, data reweighting methods, and federated learning. Algorithmic approaches include unsupervised representation learning, adversarial learning, disentangled representation learning, loss-based methods and causality-based methods.

4.
J Biomed Inform ; 153: 104640, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38608915

RESUMO

Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.


Assuntos
Inteligência Artificial , Medicina Baseada em Evidências , Humanos , Confiança , Processamento de Linguagem Natural
5.
J Biomed Inform ; 153: 104642, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38621641

RESUMO

OBJECTIVE: To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio. METHODS: We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups. RESULTS: We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (∼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups. CONCLUSIONS: Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.


Assuntos
Narração , Processamento de Linguagem Natural , Determinantes Sociais da Saúde , Humanos , Feminino , Masculino , Viés , Registros Eletrônicos de Saúde , Documentação/métodos , Mineração de Dados/métodos
6.
J Am Med Inform Assoc ; 31(5): 1163-1171, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38471120

RESUMO

OBJECTIVES: Extracting PICO (Populations, Interventions, Comparison, and Outcomes) entities is fundamental to evidence retrieval. We present a novel method, PICOX, to extract overlapping PICO entities. MATERIALS AND METHODS: PICOX first identifies entities by assessing whether a word marks the beginning or conclusion of an entity. Then, it uses a multi-label classifier to assign one or more PICO labels to a span candidate. PICOX was evaluated using 1 of the best-performing baselines, EBM-NLP, and 3 more datasets, ie, PICO-Corpus and randomized controlled trial publications on Alzheimer's Disease (AD) or COVID-19, using entity-level precision, recall, and F1 scores. RESULTS: PICOX achieved superior precision, recall, and F1 scores across the board, with the micro F1 score improving from 45.05 to 50.87 (P ≪.01). On the PICO-Corpus, PICOX obtained higher recall and F1 scores than the baseline and improved the micro recall score from 56.66 to 67.33. On the COVID-19 dataset, PICOX also outperformed the baseline and improved the micro F1 score from 77.10 to 80.32. On the AD dataset, PICOX demonstrated comparable F1 scores with higher precision when compared to the baseline. CONCLUSION: PICOX excels in identifying overlapping entities and consistently surpasses a leading baseline across multiple datasets. Ablation studies reveal that its data augmentation strategy effectively minimizes false positives and improves precision.


Assuntos
Doença de Alzheimer , COVID-19 , Humanos , Processamento de Linguagem Natural
7.
JAMA Psychiatry ; 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38506817

RESUMO

Importance: Suicide rates in the US increased by 35.6% from 2001 to 2021. Given that most individuals die on their first attempt, earlier detection and intervention are crucial. Understanding modifiable risk factors is key to effective prevention strategies. Objective: To identify distinct suicide profiles or classes, associated signs of suicidal intent, and patterns of modifiable risks for targeted prevention efforts. Design, Setting, and Participants: This cross-sectional study used data from the 2003-2020 National Violent Death Reporting System Restricted Access Database for 306 800 suicide decedents. Statistical analysis was performed from July 2022 to June 2023. Exposures: Suicide decedent profiles were determined using latent class analyses of available data on suicide circumstances, toxicology, and methods. Main Outcomes and Measures: Disclosure of recent intent, suicide note presence, and known psychotropic usage. Results: Among 306 800 suicide decedents (mean [SD] age, 46.3 [18.4] years; 239 627 males [78.1%] and 67 108 females [21.9%]), 5 profiles or classes were identified. The largest class, class 4 (97 175 [31.7%]), predominantly faced physical health challenges, followed by polysubstance problems in class 5 (58 803 [19.2%]), and crisis, alcohol-related, and intimate partner problems in class 3 (55 367 [18.0%]), mental health problems (class 2, 53 928 [17.6%]), and comorbid mental health and substance use disorders (class 1, 41 527 [13.5%]). Class 4 had the lowest rates of disclosing suicidal intent (13 952 [14.4%]) and leaving a suicide note (24 351 [25.1%]). Adjusting for covariates, compared with class 1, class 4 had the highest odds of not disclosing suicide intent (odds ratio [OR], 2.58; 95% CI, 2.51-2.66) and not leaving a suicide note (OR, 1.45; 95% CI, 1.41-1.49). Class 4 also had the lowest rates of all known psychiatric illnesses and psychotropic medications among all suicide profiles. Class 4 had more older adults (23 794 were aged 55-70 years [24.5%]; 20 100 aged ≥71 years [20.7%]), veterans (22 220 [22.9%]), widows (8633 [8.9%]), individuals with less than high school education (15 690 [16.1%]), and rural residents (23 966 [24.7%]). Conclusions and Relevance: This study identified 5 distinct suicide profiles, highlighting a need for tailored prevention strategies. Improving the detection and treatment of coexisting mental health conditions, substance and alcohol use disorders, and physical illnesses is paramount. The implementation of means restriction strategies plays a vital role in reducing suicide risks across most of the profiles, reinforcing the need for a multifaceted approach to suicide prevention.

8.
JAMIA Open ; 7(1): ooae021, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38455840

RESUMO

Objective: To automate scientific claim verification using PubMed abstracts. Materials and Methods: We developed CliVER, an end-to-end scientific Claim VERification system that leverages retrieval-augmented techniques to automatically retrieve relevant clinical trial abstracts, extract pertinent sentences, and use the PICO framework to support or refute a scientific claim. We also created an ensemble of three state-of-the-art deep learning models to classify rationale of support, refute, and neutral. We then constructed CoVERt, a new COVID VERification dataset comprising 15 PICO-encoded drug claims accompanied by 96 manually selected and labeled clinical trial abstracts that either support or refute each claim. We used CoVERt and SciFact (a public scientific claim verification dataset) to assess CliVER's performance in predicting labels. Finally, we compared CliVER to clinicians in the verification of 19 claims from 6 disease domains, using 189 648 PubMed abstracts extracted from January 2010 to October 2021. Results: In the evaluation of label prediction accuracy on CoVERt, CliVER achieved a notable F1 score of 0.92, highlighting the efficacy of the retrieval-augmented models. The ensemble model outperforms each individual state-of-the-art model by an absolute increase from 3% to 11% in the F1 score. Moreover, when compared with four clinicians, CliVER achieved a precision of 79.0% for abstract retrieval, 67.4% for sentence selection, and 63.2% for label prediction, respectively. Conclusion: CliVER demonstrates its early potential to automate scientific claim verification using retrieval-augmented strategies to harness the wealth of clinical trial abstracts in PubMed. Future studies are warranted to further test its clinical utility.

9.
ArXiv ; 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38529077

RESUMO

Objectives: Artificial intelligence (AI) systems have the potential to revolutionize clinical practices, including improving diagnostic accuracy and surgical decision-making, while also reducing costs and manpower. However, it is important to recognize that these systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender. Such biases can occur before, during, or after the development of AI models, making it critical to understand and address potential biases to enable the accurate and reliable application of AI models in clinical settings. To mitigate bias concerns during model development, we surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV). Then we discussed the methods, such as data perturbation and adversarial learning, that have been applied in the biomedical domain to address bias. Methods: We performed our literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords. We then filtered the result of 10,041 articles automatically with loose constraints, and manually inspected the abstracts of the remaining 890 articles to identify the 55 articles included in this review. Additional articles in the references are also included in this review. We discuss each method and compare its strengths and weaknesses. Finally, we review other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness. Results: The bias of AIs in biomedicine can originate from multiple sources such as insufficient data, sampling bias and the use of health-irrelevant features or race-adjusted algorithms. Existing debiasing methods that focus on algorithms can be categorized into distributional or algorithmic. Distributional methods include data augmentation, data perturbation, data reweighting methods, and federated learning. Algorithmic approaches include unsupervised representation learning, adversarial learning, disentangled representation learning, loss-based methods and causality-based methods.

10.
ArXiv ; 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38410646

RESUMO

Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows.

11.
J Am Med Inform Assoc ; 31(4): 809-819, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38065694

RESUMO

OBJECTIVES: COVID-19, since its emergence in December 2019, has globally impacted research. Over 360 000 COVID-19-related manuscripts have been published on PubMed and preprint servers like medRxiv and bioRxiv, with preprints comprising about 15% of all manuscripts. Yet, the role and impact of preprints on COVID-19 research and evidence synthesis remain uncertain. MATERIALS AND METHODS: We propose a novel data-driven method for assigning weights to individual preprints in systematic reviews and meta-analyses. This weight termed the "confidence score" is obtained using the survival cure model, also known as the survival mixture model, which takes into account the time elapsed between posting and publication of a preprint, as well as metadata such as the number of first 2-week citations, sample size, and study type. RESULTS: Using 146 preprints on COVID-19 therapeutics posted from the beginning of the pandemic through April 30, 2021, we validated the confidence scores, showing an area under the curve of 0.95 (95% CI, 0.92-0.98). Through a use case on the effectiveness of hydroxychloroquine, we demonstrated how these scores can be incorporated practically into meta-analyses to properly weigh preprints. DISCUSSION: It is important to note that our method does not aim to replace existing measures of study quality but rather serves as a supplementary measure that overcomes some limitations of current approaches. CONCLUSION: Our proposed confidence score has the potential to improve systematic reviews of evidence related to COVID-19 and other clinical conditions by providing a data-driven approach to including unpublished manuscripts.


Assuntos
COVID-19 , Humanos , Revisões Sistemáticas como Assunto , Projetos de Pesquisa , PubMed , Pandemias
12.
Cancer Biomark ; 39(1): 49-62, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-37545215

RESUMO

BACKGROUND: Abelson interactor 1 (ABI1) is associated with the metastasis and prognosis of many malignancies. The association between ABI1 transcript spliced variants, their molecular constitutive exons and exon-exon junctions (EEJs) in 14 cancer types and clinical outcomes remains unsolved. OBJECTIVE: To identify novel cancer metastatic and prognostic biomarkers from ABI1 total mRNA, TSVs, and molecular constitutive elements. METHODS: Using data from TCGA and TSVdb database, the standard median of ABI1 total mRNA, TSV, exon, and EEJ expression was used as a cut-off value. Kaplan-Meier analysis, Chi-squared test (X2) and Kendall's tau statistic were used to identify novel metastatic and prognostic biomarkers, and Cox regression analysis was performed to screen and identify independent prognostic factors. RESULTS: A total of 35 ABI1-related factors were found to be closely related to the prognosis of eight candidate cancer types. A total of 14 ABI1 TSVs and molecular constitutive elements were identified as novel metastatic and prognostic biomarkers in four cancer types. A total of 13 ABI1 molecular constitutive elements were identified as independent prognostic biomarkers in six cancer types. CONCLUSIONS: In this study, we identified 14 ABI1-related novel metastatic and prognostic markers and 21 independent prognostic factors in total 8 candidate cancer types.


Assuntos
Neoplasias , Humanos , Prognóstico , Neoplasias/genética , Perfilação da Expressão Gênica , Biomarcadores , RNA Mensageiro/genética , Biomarcadores Tumorais/genética , Proteínas do Citoesqueleto/genética , Proteínas Adaptadoras de Transdução de Sinal/genética
13.
ArXiv ; 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-37986726

RESUMO

Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" - there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.

14.
NPJ Digit Med ; 6(1): 225, 2023 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-38042910

RESUMO

In 2020, the U.S. Department of Defense officially disclosed a set of ethical principles to guide the use of Artificial Intelligence (AI) technologies on future battlefields. Despite stark differences, there are core similarities between the military and medical service. Warriors on battlefields often face life-altering circumstances that require quick decision-making. Medical providers experience similar challenges in a rapidly changing healthcare environment, such as in the emergency department or during surgery treating a life-threatening condition. Generative AI, an emerging technology designed to efficiently generate valuable information, holds great promise. As computing power becomes more accessible and the abundance of health data, such as electronic health records, electrocardiograms, and medical images, increases, it is inevitable that healthcare will be revolutionized by this technology. Recently, generative AI has garnered a lot of attention in the medical research community, leading to debates about its application in the healthcare sector, mainly due to concerns about transparency and related issues. Meanwhile, questions around the potential exacerbation of health disparities due to modeling biases have raised notable ethical concerns regarding the use of this technology in healthcare. However, the ethical principles for generative AI in healthcare have been understudied. As a result, there are no clear solutions to address ethical concerns, and decision-makers often neglect to consider the significance of ethical principles before implementing generative AI in clinical practice. In an attempt to address these issues, we explore ethical principles from the military perspective and propose the "GREAT PLEA" ethical principles, namely Governability, Reliability, Equity, Accountability, Traceability, Privacy, Lawfulness, Empathy, and Eutonomy, for generative AI in healthcare. Furthermore, we introduce a framework for adopting and expanding these ethical principles in a practical way that has been useful in the military and can be applied to healthcare for generative AI, based on contrasting their ethical concerns and risks. Ultimately, we aim to proactively address the ethical dilemmas and challenges posed by the integration of generative AI into healthcare practice.

15.
Front Psychiatry ; 14: 1258887, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38053538

RESUMO

Objective: Evidence suggests that high-quality health education and effective communication within the framework of social support hold significant potential in preventing postpartum depression. Yet, developing trustworthy and engaging health education and communication materials requires extensive expertise and substantial resources. In light of this, we propose an innovative approach that involves leveraging natural language processing (NLP) to classify publicly accessible lay articles based on their relevance and subject matter to pregnancy and mental health. Materials and methods: We manually reviewed online lay articles from credible and medically validated sources to create a gold standard corpus. This manual review process categorized the articles based on their pertinence to pregnancy and related subtopics. To streamline and expand the classification procedure for relevance and topics, we employed advanced NLP models such as Random Forest, Bidirectional Encoder Representations from Transformers (BERT), and Generative Pre-trained Transformer model (gpt-3.5-turbo). Results: The gold standard corpus included 392 pregnancy-related articles. Our manual review process categorized the reading materials according to lifestyle factors associated with postpartum depression: diet, exercise, mental health, and health literacy. A BERT-based model performed best (F1 = 0.974) in an end-to-end classification of relevance and topics. In a two-step approach, given articles already classified as pregnancy-related, gpt-3.5-turbo performed best (F1 = 0.972) in classifying the above topics. Discussion: Utilizing NLP, we can guide patients to high-quality lay reading materials as cost-effective, readily available health education and communication sources. This approach allows us to scale the information delivery specifically to individuals, enhancing the relevance and impact of the materials provided.

16.
Res Sq ; 2023 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-38106170

RESUMO

Objective: While artificial intelligence (AI), particularly large language models (LLMs), offers significant potential for medicine, it raises critical concerns due to the possibility of generating factually incorrect information, leading to potential long-term risks and ethical issues. This review aims to provide a comprehensive overview of the faithfulness problem in existing research on AI in healthcare and medicine, with a focus on the analysis of the causes of unfaithful results, evaluation metrics, and mitigation methods. Materials and Methods: Using PRISMA methodology, we sourced 5,061 records from five databases (PubMed, Scopus, IEEE Xplore, ACM Digital Library, Google Scholar) published between January 2018 to March 2023. We removed duplicates and screened records based on exclusion criteria. Results: With 40 leaving articles, we conducted a systematic review of recent developments aimed at optimizing and evaluating factuality across a variety of generative medical AI approaches. These include knowledge-grounded LLMs, text-to-text generation, multimodality-to-text generation, and automatic medical fact-checking tasks. Discussion: Current research investigating the factuality problem in medical AI is in its early stages. There are significant challenges related to data resources, backbone models, mitigation methods, and evaluation metrics. Promising opportunities exist for novel faithful medical AI research involving the adaptation of LLMs and prompt engineering. Conclusion: This comprehensive review highlights the need for further research to address the issues of reliability and factuality in medical AI, serving as both a reference and inspiration for future research into the safe, ethical use of AI in medicine and healthcare.

17.
Nat Commun ; 14(1): 6261, 2023 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-37803009

RESUMO

Deep learning has become a popular tool for computer-aided diagnosis using medical images, sometimes matching or exceeding the performance of clinicians. However, these models can also reflect and amplify human bias, potentially resulting inaccurate missed diagnoses. Despite this concern, the problem of improving model fairness in medical image classification by deep learning has yet to be fully studied. To address this issue, we propose an algorithm that leverages the marginal pairwise equal opportunity to reduce bias in medical image classification. Our evaluations across four tasks using four independent large-scale cohorts demonstrate that our proposed algorithm not only improves fairness in individual and intersectional subgroups but also maintains overall performance. Specifically, the relative change in pairwise fairness difference between our proposed model and the baseline model was reduced by over 35%, while the relative change in AUC value was typically within 1%. By reducing the bias generated by deep learning models, our proposed approach can potentially alleviate concerns about the fairness and reliability of image-based computer-aided diagnosis.


Assuntos
Algoritmos , Diagnóstico por Computador , Humanos , Reprodutibilidade dos Testes , Diagnóstico por Computador/métodos , Computadores
18.
ArXiv ; 2023 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-37791108

RESUMO

Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance. However, the nuanced ways in which pruning impacts model behavior are not well understood, particularly for long-tailed, multi-label datasets commonly found in clinical settings. This knowledge gap could have dangerous implications when deploying a pruned model for diagnosis, where unexpected model behavior could impact patient well-being. To fill this gap, we perform the first analysis of pruning's effect on neural networks trained to diagnose thorax diseases from chest X-rays (CXRs). On two large CXR datasets, we examine which diseases are most affected by pruning and characterize class "forgettability" based on disease frequency and co-occurrence behavior. Further, we identify individual CXRs where uncompressed and heavily pruned models disagree, known as pruning-identified exemplars (PIEs), and conduct a human reader study to evaluate their unifying qualities. We find that radiologists perceive PIEs as having more label noise, lower image quality, and higher diagnosis difficulty. This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification. All code, model weights, and data access instructions can be found at https://github.com/VITA-Group/PruneCXR.

19.
Med Image Comput Comput Assist Interv ; 14224: 663-673, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37829549

RESUMO

Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance. However, the nuanced ways in which pruning impacts model behavior are not well understood, particularly for long-tailed, multi-label datasets commonly found in clinical settings. This knowledge gap could have dangerous implications when deploying a pruned model for diagnosis, where unexpected model behavior could impact patient well-being. To fill this gap, we perform the first analysis of pruning's effect on neural networks trained to diagnose thorax diseases from chest X-rays (CXRs). On two large CXR datasets, we examine which diseases are most affected by pruning and characterize class "forgettability" based on disease frequency and co-occurrence behavior. Further, we identify individual CXRs where uncompressed and heavily pruned models disagree, known as pruning-identified exemplars (PIEs), and conduct a human reader study to evaluate their unifying qualities. We find that radiologists perceive PIEs as having more label noise, lower image quality, and higher diagnosis difficulty. This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification. All code, model weights, and data access instructions can be found at https://github.com/VITA-Group/PruneCXR.

20.
Proc Conf Assoc Comput Linguist Meet ; 2023: 12532-12555, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37701928

RESUMO

A human decision-maker benefits the most from an AI assistant that corrects for their biases. For problems such as generating interpretation of a radiology report given findings, a system predicting only highly likely outcomes may be less useful, where such outcomes are already obvious to the user. To alleviate biases in human decision-making, it is worth considering a broad differential diagnosis, going beyond the most likely options. We introduce a new task, "less likely brainstorming," that asks a model to generate outputs that humans think are relevant but less likely to happen. We explore the task in two settings: a brain MRI interpretation generation setting and an everyday commonsense reasoning setting. We found that a baseline approach of training with less likely hypotheses as targets generates outputs that humans evaluate as either likely or irrelevant nearly half of the time; standard MLE training is not effective. To tackle this problem, we propose a controlled text generation method that uses a novel contrastive learning strategy to encourage models to differentiate between generating likely and less likely outputs according to humans. We compare our method with several state-of-the-art controlled text generation models via automatic and human evaluations and show that our models' capability of generating less likely outputs is improved.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA